This a template for an analysis notebook using RMarkdown.
In this notebook, we will set up parameters in the seurat workflow [normalization –> reduction –> clustering] for one Wilms tumor sample (SCPCS000169) of the Wilms Tumor dataset (SCPCP000006).
This correspond to the step 2 of the proposed analysis: clustering of cells across a set of parameters for few samples
Load required packages in the following chunk, if needed. Do not
install packages here; only load them with the library()
function.
library(SingleCellExperiment)
library(Seurat) # to use Seurat workflow
library(tidyr)
library(dplyr)
library(DT) # for table visualization
library(ggplot2) # for visualization
library(patchwork) # for visualization
library(ggplotify)
library(SCpubr) # for visualization
library(viridis) # for visualization - colors
library(edgeR) # For pseudobulk based differential expression (DE) analysis
library(DElegate) # for pseudobulk based DE analysis and identification of marker genes
library(Azimuth) # For label transferWe store in a config list names “cfg” parameters used all along the analysis to filter for p-value, log fold change, percentage of expression, etc.
In the chunk below, we include some common directories that you might
need, finding their locations using the very useful rprojroot
package.
Note for the DataLab, not sure what to do with this, I decided to use paths below.
In our case, the SCPCP000006 dataset downloaded the 25-06-2024 is locally saved in /mnt_data/Wilms ALSF/SCPCP000006_2024-06-25. Please mount your local folder of data accordingly or change the path below. Ideally, the mnt_data is set up in the config.yaml file
# input
# please mount your local folder of data accordingly or change the path below. ideally, the mnt_data is set up in the config.yaml file
path_to_data <- "~/mnt_data/Wilms ALSF/SCPCP000006_2024-06-25"
# get all files in the data directory
filelist <- list.files(path_to_data, full.names = TRUE)
# select the 40 Wilms tumor single nucleus folders only
filelist <- filelist[grepl("/mnt_data/Wilms ALSF/SCPCP000006_2024-06-25/SCPC", filelist)]
# output
## html report for each of the Wilms tumor sample will be saved in
path_to_report <- "~/notebook/01-clustering"
## some plots from the report will be saved in
path_to_plot <- "~/plots"
## after final decision on the clustering strategy, processed RDS files will be saved in
path_to_result <- "~/results"## to be modified when rendered
i = 2
filedir <- filelist[i]
sample <- gsub("/home/rstudio/mnt_data/Wilms ALSF/SCPCP000006_2024-06-25/", "", filedir)
metadata <- read.table("SCPCP000006_metadata.tsv", sep = "\t", header=TRUE)
metadata[metadata$scpca_sample_id == sample,]| scpca_project_id | scpca_sample_id | scpca_library_id | diagnosis | subdiagnosis | disease_timing | age_at_diagnosis | sex | tissue_location | participant_id | submitter | submitter_id | organism | development_stage_ontology_term_id | sex_ontology_term_id | organism_ontology_id | self_reported_ethnicity_ontology_term_id | disease_ontology_term_id | tissue_ontology_term_id | metastasis | relapse_status | treatment | vital_status | seq_unit | technology | total_reads | mapped_reads | sample_cell_count_estimate | unfiltered_cells | filtered_cell_count | processed_cells | has_cellhash | includes_anndata | is_cell_line | is_multiplexed | is_xenograft | pi_name | project_title | genome_assembly | mapping_index | alevin_fry_version | salmon_version | transcript_type | droplet_filtering_method | cell_filtering_method | prob_compromised_cutoff | min_gene_cutoff | normalization_method | date_processed | workflow | workflow_version | workflow_commit | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | SCPCP000006 | SCPCS000169 | SCPCL000206 | Wilms tumor | Favorable | Initial diagnosis | 2 | M | Kidney | SJWLM059660 | murphy_chen | SJWLM059660_D1 | Homo sapiens | HsapDv:0000096 | PATO:0000384 | NCBITaxon:9606 | unknown | MONDO:0006058 | UBERON:0002113 | NA | No | Upfront resection | Alive | nucleus | 10Xv3.1 | 139450314 | 28626608 | 7102 | 66522 | 7102 | 6394 | False | True | False | False | False | murphy_chen | Single nuclear RNA-seq and spatial transcriptomic analysis of anaplastic and favorable histology Wilms tumor | Homo_sapiens.GRCh38.104 | Homo_sapiens.GRCh38.104.spliced_intron.txome | 0.7.0 | 1.5.2 | [total, spliced] | emptyDropsCellRanger | miQC | 0.75 | 200 | deconvolution | 2024-03-18T19:44:31+0000 | https://github.com/AlexsLemonade/scpca-nf | v0.8.0 | 8bef82d853d19e5aeddd75401aa54cf8bfbced13 |
We perform the following analysis to assess for the quality of clustering:
[1] We perform some quality check to assess any QC-induced clustering (nFeature, nCount, percent.mito).
[2] We add cell cycle information, as we know that in a specific cell cycle state, the transcriptional program is mostly/exclusively related to cell cycle genes and the identity of cells is difficult to determine. We expect these cells to cluster together in a cluster of proliferating cells.
[3] We run DElegate::FindAllMarkers2 to find markers of the different clusters and manually check if they do make sense. DElegate::FindAllMarkers2 is an improved version of Seurat::FindAllMarkers based on pseudobulk differential expression method.
[4] We look at specific marker genes that we reported in the table marker.sets/CellType_metadata.csv to check the relevance of the clustering.
[5] We plot pca/umap reduction grouping with available annotations (singler_, cellassign_). We expect at least immune cells to be correctly label and fall into a few set of clusters.
[6] We run label transfer (Azimuth) to transfer annotation from the fetal kidney atlas human reference. We plot pca/umap reduction grouping with latest labels. We expect it to be the most representative of the cell types in the sample.
Please note: to keep the notebook as straight as possible, we decided to show the analysis for the selected set of parameters:
These other parameters have been tested but not selected:
For this analysis, we worked with the _processed.rds data. We builded a Seurat object based on the counts data and re-perform the analysis [normalization –> reduction –> clustering] following the Seurat workflow.
We transferred meta.data to keep:
[x] QC data computed by the DataLab
[x] annotation data computed by the DataLab
[x] raw annotation and gene_symbol conversion
# convert to seurat
srat <- CreateSeuratObject(counts = counts(sce),
assay = "RNA",
project = file)
# convert colData and rowData to data.frame for use in the Seurat object
cell_metadata <- as.data.frame(colData(sce))
row_metadata <- as.data.frame(rowData(sce))
# add cell metadata (colData) from SingleCellExperiment to Seurat
srat@meta.data <- cell_metadata
# add row metadata (rowData) from SingleCellExperiment to Seurat
srat[["RNA"]]@meta.data <- row_metadata
# add metadata from SingleCellExperiment to Seurat
srat@misc <- metadata(sce)
# Normalization
options(future.globals.maxSize= 891289600)
srat <- SCTransform(srat, verbose = F, conserve.memory = TRUE)
# dimensionality reduction
srat <- RunPCA(srat, verbose = F)
srat <- RunUMAP(srat, dims = 1:50, verbose = F)## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
# clustering
srat <- FindNeighbors(srat, dims = 1:50, verbose = F)
srat <- FindClusters(srat, verbose = T)## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 6394
## Number of edges: 269470
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7964
## Number of communities: 15
## Elapsed time: 0 seconds
d2 <- SCpubr::do_DimPlot(srat, reduction="umap", group.by = "seurat_clusters", label = TRUE) + ggtitle("Seurat Cluster - umap")
d1 <- SCpubr::do_DimPlot(srat, reduction="pca", group.by = "seurat_clusters", label = TRUE) + ggtitle("Seurat Cluster - pca")
d1|d2The umap clustering is expected for a triphasic Wilms tumor sample. The 3 main clusters might be:
The 2 small clusters C11 and C13 might be immune and endothelial cells.
To be checked in the following analysis!
SCpubr::do_ViolinPlot(srat, features = c("subsets_mito_percent"), ncol = 3, group.by = "seurat_clusters", legend.position = "none") s.genes <- srat@assays$RNA@meta.data$gene_ids[srat@assays$RNA@meta.data$gene_symbol %in% cc.genes$s.genes]
g2m.genes <- srat@assays$RNA@meta.data$gene_ids[srat@assays$RNA@meta.data$gene_symbol %in% cc.genes$g2m.genes]
srat <- CellCycleScoring(srat, s.features = s.genes, g2m.features = g2m.genes, set.ident = FALSE)d2 <- SCpubr::do_DimPlot(srat, reduction="umap", group.by = "Phase", label = TRUE) + ggtitle("Phase - umap")
d1 <- SCpubr::do_DimPlot(srat, reduction="umap", group.by = "seurat_clusters", label = TRUE) + ggtitle("Seurat Cluster - umap")
d1|d2 SCpubr::do_ViolinPlot(srat, features = c("S.Score", "G2M.Score"), ncol = 2, group.by = "seurat_clusters", legend.position = "none") Cluster 3 might be composed of cycling cells, in higher proportion than the other clusters.
Here, we open the table of marker genes marker-sets/CellType_metadata.csv.
# open the set of reference marker genes
CellType_metadata <- read.table("marker-sets/CellType_metadata.csv", header = TRUE, sep = ",")
DT::datatable(CellType_metadata, caption = ("CellType_metadata"),
extensions = 'Buttons',
options = list( dom = 'Bfrtip',
buttons = c( 'csv', 'excel'))) SCpubr::do_ViolinPlot(srat, features = rownames(srat)[rownames(srat) %in% (CellType_metadata$ENSEMBL_ID[CellType_metadata$cell_class %in% c("malignant")])] , ncol = 6, group.by = "seurat_clusters", legend.position = "none")
The set of clusters 0-1-3-4-6-10-12-14 is NCAM1+ (ENSG00000149294) which
is a marker of blastema area. C2 is positive for COL6A3 which is a
marker of stroma Wilms tumor cells (also normal stroma)
SCpubr::do_FeaturePlot(srat, features = rownames(srat)[rownames(srat) %in% (CellType_metadata$ENSEMBL_ID[CellType_metadata$cell_class %in% c("malignant")])]) SCpubr::do_ViolinPlot(srat, features = rownames(srat)[rownames(srat) %in% (CellType_metadata$ENSEMBL_ID[CellType_metadata$cell_class %in% c("immune")])]
, ncol = 6, group.by = "seurat_clusters", legend.position = "none") SCpubr::do_FeaturePlot(srat, features = rownames(srat)[rownames(srat) %in% (CellType_metadata$ENSEMBL_ID[CellType_metadata$cell_class %in% c("immune")])])
C11 is PTPRC+ (ENSG00000081237) and must be a cluster of immune cells.
Wilms tumor are generally cold tumor, especially untreated Wilms tumor.
According to the clinical data, the present sample is untreated
(treatment = upfront resection). We do not expect high percentage of
immune cells. The size of the cluster would therefore make sense.
SCpubr::do_ViolinPlot(srat, features = rownames(srat)[rownames(srat) %in% (CellType_metadata$ENSEMBL_ID[! CellType_metadata$cell_class %in% c("immune", "malignant")])]
, ncol = 6, group.by = "seurat_clusters", legend.position = "none") SCpubr::do_FeaturePlot(srat, features = rownames(srat)[rownames(srat) %in% (CellType_metadata$ENSEMBL_ID[! CellType_metadata$cell_class %in% c("immune", "malignant")])], ncol = 3)C13 is VWF+ (ENSG00000110799) and must be a cluster of endothelial cells.
In addition to the list of known marker genes, we used an unbiased approach to find transcripts that characterized the different clusters. We run DElegate::FindAllMarkers2 to find markers of the different clusters and manually check if they do make sense. DElegate::FindAllMarkers2 is an improved version of Seurat::FindAllMarkers based on pseudobulk differential expression method. Please check the preprint from Chistoph Hafemeister: https://www.biorxiv.org/content/10.1101/2023.03.28.534443v1 and tool described here: https://github.com/cancerbits/DElegate
feature_conversion <- srat@assays$RNA@meta.data
de_results <- DElegate::FindAllMarkers2(srat, group_column = "seurat_clusters")
#filter the most relevant markers
s.markers <- de_results[de_results$padj < cfg$padj_thershold & de_results$log_fc > cfg$lfc_threshold & de_results$rate1 > cfg$rate1_threshold,]
# add gene symbol for easiest interpretation of the result
s.markers$gene_ids <- s.markers$feature
s.markers <- left_join(s.markers,feature_conversion, by = c( "gene_ids") )
identical(s.markers$feature, s.markers$gene_ids) # check the quality of the merge, must be true## [1] TRUE
DT::datatable(s.markers, caption = ("marker genes"),
extensions = 'Buttons',
options = list( dom = 'Bfrtip',
buttons = c( 'csv', 'excel')))s.markers <- na.omit(s.markers)
s.markers %>%
group_by(group1) %>%
top_n(n = 5, wt = log_fc) -> top5
# subset for plotting
cells <- WhichCells(srat, downsample = 100)
ss <- subset(srat, cells = cells)
ss <- ScaleData(ss, c( top5$feature))
p1 <- SCpubr::do_DimPlot(srat, reduction="umap", group.by = "seurat_clusters", label = TRUE) + ggtitle("Seurat Cluster - umap")
p2 <- DoHeatmap(ss, features = top5$feature, cells = cells, group.by = "seurat_clusters") + NoLegend() +
scale_fill_gradientn(colors = c("#01665e","#35978f",'darkslategray3', "#f7f7f7", "#fee391","#fec44f","#F9AD03"))
p3 <- ggplot(srat@meta.data, aes(seurat_clusters, fill = seurat_clusters)) + geom_bar() + NoLegend()
p4 <- VlnPlot(srat, features = c("nFeature_RNA", "nCount_RNA", "subsets_mito_percent"), ncol = 3)
common_title <- sprintf("Unsupervised clustering %s, %d cells", srat@meta.data$orig.ident[1], ncol(srat))
show((((p1 / p3) + plot_layout(heights = c(3,2)) | p2) / p4) + plot_layout(widths = c(1, 2)) + plot_layout(heights = c(3,1)) + plot_annotation(title = common_title))DT::datatable(top5[,c(1,9,11,12)], caption = ("top 5 marker genes"),
extensions = 'Buttons',
options = list( dom = 'Bfrtip',
buttons = c( 'csv', 'excel')))The cluster that we found seems to make sense as they can be defined by a few number of specific genes each. I don’t know most of the marker genes found, but few point below that could support the analysis and annotation.
UNC5D (C1, dependency receptor), has been described in cancer and shown to regulate p53-dependent apoptosis (https://pubmed.ncbi.nlm.nih.gov/24691657/). We know that especially in blastema cell, p53 signaling play an important role.
PDCH7 (C7, protocadherin 7) is expressed in human kidney renal tubule (normal kidney epithelium, see https://www.proteinatlas.org/ENSG00000169851-PCDH7/tissue/kidney) and has been associated with oncogenic function in lung cancer (https://pubmed.ncbi.nlm.nih.gov/30409919/, https://www.frontiersin.org/journals/pharmacology/articles/10.3389/fphar.2023.1217213/full).
EBF1 (C8, Early B Cell factor 1) has been shown to play a role in kidney development, esp. glomerular, while expressed in stroma progenitor (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6727263/).
ROR1 (C8, Receptor tyrosine kinase-like orphan receptor 1) ROR1 is only detectable in embryonic tissue and generally absent in mature tissue. It has been associated with cancer (https://academic.oup.com/proteincell/article/5/7/496/6831810) and cancer stroma (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5367847/).
ERBB4 (C9, EGF receptor family) is expressed during kidney development of epithelial tubule and uteric bud (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3269924/)
FLT1 (C13, vascular endothelial growth factor receptor 1), PCAM and ELMO1 in the top5 of C13 marker genes confirm the identification of endothelial cells (https://www.ahajournals.org/doi/pdf/10.1161/CIRCRESAHA.109.213983, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4986701/)
Here, we quickly checked annotations that are present in the _processed rds object. However, the automated annotation have not been performed using a cancer specific reference or a kidney reference. We do not expect a nice labelling of the cells as the overlap of cell types between the reference and the query dataset is poor. This support the need to do a proper label transfer from the fetal kidney atlas, which is imho the best reference that can be applied to a Wilms tumor query.
d2 <-DimPlot(srat, group.by = "singler_celltype_annotation", reduction = "umap", label = TRUE, repel = TRUE) + NoLegend()
DT::datatable(table(srat$seurat_clusters, srat$singler_celltype_annotation), caption = ("table of SingleR annotation per seurat clusters"),
extensions = 'Buttons',
options = list( dom = 'Bfrtip',
buttons = c( 'csv', 'excel')))d3 <- SCpubr::do_BarPlot(sample = srat,
group.by = "singler_celltype_annotation",
split.by = "seurat_clusters",
position = "fill",
font.size = 10,
legend.ncol = 3) +
ggtitle("% cells")+
xlab(file)
d2|d3## Warning: ggrepel: 10 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
d2 <-DimPlot(srat, group.by = "cellassign_celltype_annotation", reduction = "umap", label = TRUE, repel = TRUE) + NoLegend()
DT::datatable(table(srat$seurat_clusters, srat$cellassign_celltype_annotation)
, caption = ("table of cellassign annotation per seurat clusters"),
extensions = 'Buttons',
options = list( dom = 'Bfrtip',
buttons = c( 'csv', 'excel')))d3 <- SCpubr::do_BarPlot(sample = srat,
group.by = "cellassign_celltype_annotation",
split.by = "seurat_clusters",
position = "fill",
font.size = 10,
legend.ncol = 3) +
ggtitle("% cells")+
xlab(file)
d2|d3## Warning: ggrepel: 8 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
Even if the result of SingleR and CellAssign are not specific to a Wilms tumor dataset, it confirm once again that C13 is a cluster of endothelial cells and C11 a cluster of immune cells.
Note to the DataLab : where should I save the fetal kidney reference? In S3 like results as it is a quite big object or in the folder marker-sets?
For more information related to the reference, please go to https://www.kidneycellatlas.org/ You will find:
interactive viewer
h5ad files to download.
Please note that as Wilms tumor have been described to be closer to fetal kidney as mature kidney, we only used the fetal kidney atlas as the reference. Also check : https://www.science.org/doi/10.1126/science.aat5031
# access to the azimuth kidney reference
##### to be updated depending on the location of the reference file
ref_dir = "~/mnt_resources/datasets/Kidney_cell_atlas/fetal_full/Azimuth_Compatible_Fetal_full"
reference <- LoadReference(ref_dir)## Warning: Overwriting miscellanous data for model
## Warning: Adding a dimensional reduction (refUMAP) without the associated assay
## being present
## Warning: Adding a dimensional reduction (refUMAP) without the associated assay
## being present
# Load the query object for mapping
# Preprocess with SCTransform
srat <- RunAzimuth(srat, ref_dir, assay = 'RNA')## Warning: Overwriting miscellanous data for model
## Warning: Adding a dimensional reduction (refUMAP) without the associated assay
## being present
## Warning: Adding a dimensional reduction (refUMAP) without the associated assay
## being present
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')
## Warning: No layers found matching search pattern provided
## Warning: 1244 features of the features specified were not present in both the reference query assays.
## Continuing with remaining 1756 features.
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')
## Warning: Keys should be one or more alphanumeric characters followed by an
## underscore, setting key from integrated_dr_ to integrateddr_
## Warning in RunUMAP.default(object = neighborlist, reduction.model =
## reduction.model, : Number of neighbors between query and reference is not equal
## to the number of neighbors within reference
## Warning: No assay specified, setting assay as RNA by default.
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 15321910 818.3 22417003 1197.2 22417003 1197.2
## Vcells 110872190 845.9 312879332 2387.1 312877284 2387.1
DT::datatable(table(srat$seurat_clusters, srat$predicted.compartment), caption = ("table of azimuth compartment annotation per seurat clusters"), extensions = 'Buttons',
options = list( dom = 'Bfrtip',
buttons = c( 'csv', 'excel')))DT::datatable(table(srat$seurat_clusters, srat$predicted.celltype), caption = ("table of azimuth annotation per seurat clusters"),
extensions = 'Buttons',
options = list( dom = 'Bfrtip',
buttons = c( 'csv', 'excel')))d1 <- DimPlot(srat, reduction = "umap", group.by = "seurat_clusters", label = TRUE) + NoLegend()
d2 <- DimPlot(srat, reduction = "umap", dims = c(1,2), group.by = "predicted.compartment", label = TRUE, repel = TRUE) + NoLegend()
d3 <- DimPlot(srat, reduction = "umap", dims = c(1,2), group.by = "predicted.celltype", label = TRUE, repel = TRUE) + NoLegend()
d1 | d2 | d3Looking at the predicted compartment, we are satisfied that the immune, endothelium and stroma compartment are specific to a set of clusters. It is expected that the fetal__nephron compartment match to both blastema and epithelial cancer and normal kidney. It is thus OK to have it labeling two distinct set of clusters.
Looking at the predicted cell types, we observed that the set of cluster [0,1,3,4,6,10,12,14] is labeled as cap mesenchyme, an early developmental stage of nephron, while the set of clusters [5,7,9] is labeled with more differentiated epithelial units of the kidney. Based on this observation and the expression of NCAM1, UNC5D (C1) and PDCH7 (C7), ERBB4 (C9), we can conclude that the set of cluster [0,1,3,4,6,10,12,14] is mostly blastema cancer cells and the set of clusters [5,7,9] is eipthelial cancer or normal cells.
f1 <- SCpubr::do_BarPlot(sample = srat,
group.by = "predicted.compartment",
split.by = "seurat_clusters",
position = "fill",
font.size = 10,
legend.ncol = 3) +
ggtitle("% cells")+
xlab(file)
f2 <- SCpubr::do_BarPlot(sample = srat,
group.by = "predicted.celltype",
split.by = "seurat_clusters",
position = "fill",
font.size = 10,
legend.ncol = 3) +
ggtitle("% cells")+
xlab(file)
f1 | f2Taking together, we can conclude:
C13 = endothelium (with high confidency)
C11 = immune cells (with high confidency)
C2 and C8, that form a unit in the umap reduction, are stromal cells, either cancer or normal (with high confidency)
C[0-1-3-4-6-10-12-14] that form a unit in the umap reduction, are label as cap mesenchyme (<=> stem) and NCAM+ must be blastema cancer cells.
C[5-7-9] that form a unit in the umap reduction, are label as epithelial cells (SingleR) and differentiated kidney epithelium subunit (Fetal kidney atlas) must be epithelial (cancer or normal)
The primary aim of this script was to assess the quality of clustering and play with clustering parameters. On this sample, we are happy with the clustering, as we could identify: - one cluster of immune cells (cell marker PTPRC, SingleR, Azimuth label transfer fro the fetal kidney) - one cluster of endothelial cells (cell marker VWF, SingleR, Azimuth label transfer fro the fetal kidney) - one set of clusters of stromal cells - one set of cluster of epithelial cells - one set of cluster of blastema cancer cells.
Additionally, we showed the feasibility to run label transfer using runAzimuth (part of step 3 of the proposed analysis: https://github.com/maud-p/OpenScPCA-analysis/issues/1).
We also showed the easy and robust identification of normal cells (endothelial and immune cells) that would serve as reference for the CNV inference (inferCNV, step 4 of the analysis).
[ ] We will adapt this template to be render on all the 40 Wilms tumor samples of the dataset.
[ ] We will save for each sample the rds file
[ ] We will run inferCNV for each sample to decide the malignant/normal status of some stroma and epithelial cluster, and confirm the blastema annotation.
The next step will provide us a better understanding of the entire cohort. We will then have to set up a strategy to annotate each sample. Open questions are:
[ ] should we annotate single cell
or
[] consider applying similar annotations to all cells in a cluster?
[ ] manual annotation of each cluster / each patient
or
[] automated annotation using some threshold?
# record the versions of the packages used in this analysis and other environment information
sessionInfo()## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Europe/Vienna
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] Azimuth_0.5.0 shinyBS_0.61.1
## [3] DElegate_1.2.1 edgeR_4.2.1
## [5] limma_3.60.4 viridis_0.6.5
## [7] viridisLite_0.4.2 SCpubr_2.0.2.9000
## [9] ggplotify_0.1.2 patchwork_1.2.0
## [11] ggplot2_3.5.1 DT_0.33
## [13] dplyr_1.1.4 tidyr_1.3.1
## [15] Seurat_5.0.0 SeuratObject_5.0.2
## [17] sp_2.1-4 SingleCellExperiment_1.24.0
## [19] SummarizedExperiment_1.32.0 Biobase_2.62.0
## [21] GenomicRanges_1.54.1 GenomeInfoDb_1.38.8
## [23] IRanges_2.36.0 S4Vectors_0.40.2
## [25] BiocGenerics_0.48.1 MatrixGenerics_1.14.0
## [27] matrixStats_1.3.0
##
## loaded via a namespace (and not attached):
## [1] fs_1.6.4 ProtGenerics_1.34.0
## [3] spatstat.sparse_3.1-0 bitops_1.0-8
## [5] DirichletMultinomial_1.44.0 lubridate_1.9.3
## [7] TFBSTools_1.40.0 httr_1.4.7
## [9] RColorBrewer_1.1-3 tools_4.4.1
## [11] sctransform_0.4.1 utf8_1.2.4
## [13] R6_2.5.1 lazyeval_0.2.2
## [15] uwot_0.2.2 rhdf5filters_1.14.1
## [17] withr_3.0.0 prettyunits_1.2.0
## [19] gridExtra_2.3 progressr_0.14.0
## [21] cli_3.6.3 spatstat.explore_3.3-1
## [23] fastDummies_1.7.3 EnsDb.Hsapiens.v86_2.99.0
## [25] shinyjs_2.1.0 labeling_0.4.3
## [27] sass_0.4.9 spatstat.data_3.1-2
## [29] readr_2.1.5 ggridges_0.5.6
## [31] pbapply_1.7-2 Rsamtools_2.18.0
## [33] yulab.utils_0.1.5 R.utils_2.12.3
## [35] parallelly_1.38.0 BSgenome_1.70.2
## [37] rstudioapi_0.16.0 RSQLite_2.3.7
## [39] generics_0.1.3 gridGraphics_0.5-1
## [41] BiocIO_1.12.0 crosstalk_1.2.1
## [43] gtools_3.9.5 ica_1.0-3
## [45] spatstat.random_3.3-1 googlesheets4_1.1.1
## [47] GO.db_3.18.0 Matrix_1.7-0
## [49] fansi_1.0.6 abind_1.4-5
## [51] R.methodsS3_1.8.2 lifecycle_1.0.4
## [53] yaml_2.3.10 glmGamPoi_1.14.3
## [55] rhdf5_2.46.1 SparseArray_1.2.4
## [57] BiocFileCache_2.10.2 Rtsne_0.17
## [59] grid_4.4.1 blob_1.2.4
## [61] promises_1.3.0 shinydashboard_0.7.2
## [63] crayon_1.5.3 miniUI_0.1.1.1
## [65] lattice_0.22-6 cowplot_1.1.3
## [67] annotate_1.80.0 GenomicFeatures_1.54.4
## [69] KEGGREST_1.42.0 pillar_1.9.0
## [71] knitr_1.48 rjson_0.2.21
## [73] future.apply_1.11.2 codetools_0.2-20
## [75] fastmatch_1.1-4 leiden_0.4.3.1
## [77] glue_1.7.0 spatstat.univar_3.0-0
## [79] data.table_1.15.4 vctrs_0.6.5
## [81] png_0.1-8 spam_2.10-0
## [83] cellranger_1.1.0 poweRlaw_0.80.0
## [85] gtable_0.3.5 assertthat_0.2.1
## [87] cachem_1.1.0 xfun_0.46
## [89] Signac_1.13.0 S4Arrays_1.2.1
## [91] mime_0.12 pracma_2.4.4
## [93] survival_3.7-0 gargle_1.5.2
## [95] RcppRoll_0.3.1 statmod_1.5.0
## [97] fitdistrplus_1.2-1 ROCR_1.0-11
## [99] nlme_3.1-165 bit64_4.0.5
## [101] progress_1.2.3 filelock_1.0.3
## [103] RcppAnnoy_0.0.22 bslib_0.7.0
## [105] irlba_2.3.5.1 KernSmooth_2.23-24
## [107] SeuratDisk_0.0.0.9021 colorspace_2.1-1
## [109] seqLogo_1.68.0 DBI_1.2.3
## [111] tidyselect_1.2.1 bit_4.0.5
## [113] compiler_4.4.1 curl_5.2.1
## [115] hdf5r_1.3.11 xml2_1.3.6
## [117] DelayedArray_0.28.0 plotly_4.10.4
## [119] rtracklayer_1.62.0 caTools_1.18.2
## [121] scales_1.3.0 lmtest_0.9-40
## [123] rappdirs_0.3.3 stringr_1.5.1
## [125] digest_0.6.36 goftest_1.2-3
## [127] presto_1.0.0 spatstat.utils_3.0-5
## [129] rmarkdown_2.27 XVector_0.42.0
## [131] htmltools_0.5.8.1 pkgconfig_2.0.3
## [133] sparseMatrixStats_1.14.0 highr_0.11
## [135] dbplyr_2.5.0 fastmap_1.2.0
## [137] ensembldb_2.26.0 rlang_1.1.4
## [139] htmlwidgets_1.6.4 DelayedMatrixStats_1.24.0
## [141] shiny_1.8.1.1 farver_2.1.2
## [143] jquerylib_0.1.4 zoo_1.8-12
## [145] jsonlite_1.8.8 BiocParallel_1.36.0
## [147] R.oo_1.26.0 RCurl_1.98-1.16
## [149] magrittr_2.0.3 GenomeInfoDbData_1.2.11
## [151] dotCall64_1.1-1 Rhdf5lib_1.24.2
## [153] munsell_0.5.1 Rcpp_1.0.13
## [155] reticulate_1.38.0 stringi_1.8.4
## [157] zlibbioc_1.48.2 MASS_7.3-61
## [159] plyr_1.8.9 parallel_4.4.1
## [161] listenv_0.9.1 ggrepel_0.9.5
## [163] forcats_1.0.0 CNEr_1.38.0
## [165] deldir_2.0-4 Biostrings_2.70.3
## [167] splines_4.4.1 tensor_1.5
## [169] hms_1.1.3 locfit_1.5-9.10
## [171] BSgenome.Hsapiens.UCSC.hg38_1.4.5 igraph_2.0.3
## [173] spatstat.geom_3.3-2 RcppHNSW_0.6.0
## [175] reshape2_1.4.4 biomaRt_2.58.2
## [177] TFMPvalue_0.0.9 XML_3.99-0.17
## [179] evaluate_0.24.0 tzdb_0.4.0
## [181] JASPAR2020_0.99.10 httpuv_1.6.15
## [183] RANN_2.6.1 purrr_1.0.2
## [185] polyclip_1.10-7 future_1.34.0
## [187] SeuratData_0.2.2.9001 scattermore_1.2
## [189] xtable_1.8-4 restfulr_0.0.15
## [191] AnnotationFilter_1.26.0 RSpectra_0.16-2
## [193] later_1.3.2 googledrive_2.1.1
## [195] tibble_3.2.1 memoise_2.0.1
## [197] AnnotationDbi_1.64.1 GenomicAlignments_1.38.2
## [199] cluster_2.1.6 timechange_0.3.0
## [201] globals_0.16.3